Journal of Medical Imaging — Latest Matching Preprints

1

Generative modeling of histology tissue reduces human annotation effort for segmentation model development.

Lutnick, B. R.; Sarder, P.

2021-10-16 pathology 10.1101/2021.10.15.464564 medRxiv

Top 0.1%

10.5%

Show abstract

Segmentation of histology tissue whole side images is an important step for tissue analysis. Given enough annotated training data modern neural networks are capable accurate reproducible segmentation, however, the annotation of training datasets is time consuming. Techniques such as human in the loop annotation attempt to reduce this annotation burden, but still require a large amount of initial annotation. Semi-supervised learning, a technique which leverages both labeled and unlabeled data to learn features has shown promise for easing the burden of annotation. Towards this goal, we employ a recently published semi-supervised method: datasetGAN for the segmentation of glomeruli from renal biopsy images. We compare the performance of models trained using datasetGAN and traditional annotation and show that datasetGAN significantly reduces the amount of annotation required to develop a highly performing segmentation model. We also explore the usefulness of using datasetGAN for transfer learning and find that this greatly enhances the performance when a limited number of whole slide images are used for training.

2

Post-operative tissue fragment puzzling using histopathological vision transformer alignment HiViTAlign

Blattgerste, C. A.; Ferdous, T.; Jessen, A.; Legnar, M.; Rohr, K.; Scherl, C.; Hesser, J.; Weis, C.-A.

2025-07-18 pathology 10.1101/2025.07.14.664649 medRxiv

Top 0.1%

8.7%

Show abstract

1In pathology, reconstructing adjacent tissue parts enables an overview of the macro environment of objects like tumors. Especially, malignoma are of interest to verify invasion and resection margins, as patients with positive margins face a higher mortality risk. Reassembling image fragments is widely used in other domains, but adjacent blocks in pathology are mostly analyzed separately missing global context. In this project, neighboring tissue of pig organ whole slide images (WSI) are reconstructed without a ground truth based on histological sections at the end of a complex work-up process. Histological tissue slices with artifacts, frayed or disrupted boundaries and sometimes missing pieces complicate the puzzling task. Thus, typical approaches such as direct feature comparison of tissue boundaries or estimating a tiles position based on an overview image or a known structures are not applicable. A new approach is presented using partial image registration where only parts of a fixed and a moving image are aligned for adjacency. In contrast to existing projects aligning subsequent tissue slices of the same block, WSIs from separated blocks will be reassembled for adjacency. The used three stage vision transformer extracts image features on various scales, compares neighboring tiles by shape, color and texture and predicts transformation parameters. Even though the pipeline is capable of handling rigid transformation such as rotation or reflection, only translation is currently supported due to the limited training set. Supervised training of the network can be realized using a puzzle generator creating irregular shaped fragments of masked whole slide images. The factorized trained neural network is embedded into a sophisticated histopathological vision transformer alignment (HiViTAlign) pipeline executing the following steps in roughly 10 seconds per reassembled tissue puzzle: First, extract the specimen and mask the background in each whole slide image. Second, compare tile boundaries using partial image registration. Third, calculate the adjacency by boundary proximity for each image pair. Fourth, determine a minimal spanning tree to optimize adjacency of pairwise registrations and transformations for tissue reconstruction. The python source code for HiViTAlign to start puzzling with WSIs or other objects is available at https://github.com/cpheidelberg/HiViTAlign. The generator for creating a dataset with irregular shaped tiles can be downloaded from https://github.com/cpheidelberg/ImagePuzzleGenerator. 2 Author summaryHistopathology as the microscopic analysis of tissue remains the gold standard for evaluating tumors, especially when assessing resection margins. However, the physical processing of tissue disrupts its original three dimensional structure, leaving pathologists with fragmented, two-dimensional slices that lack spatial context. This fragmentation makes it difficult to understand the full extent and orientation of tumors and to correlate pathology results with radiological imaging used in surgical planning. In this study, we present a computational pipeline for histopathological vision transformer alignment (HiViTAlign) that reassembles fragmented histological tissue sections, similar to solving a jigsaw puzzle. Using a deep learning model based on Vision Transformers, our method predicts how individual tissue fragments are spatially related and outputs transformation parameters for adjacency. While the pipeline is designed to accommodate a variety of rigid transformations (e.g., rotation and scaling), its current implementation, constrained by the limited diversity of the training dataset, focuses solely on predicting translational shifts between fragments. A custom dataset generator was developed to create realistic puzzles from whole slide images, assigning original coordinates to each fragment to enable supervised training. The full pipeline was evaluated on both synthetic datasets and real-world whole slide images, demonstrating its ability to reconstruct tissue cross-sections without requiring a reference image. This method may support more accurate spatial interpretation of pathological specimens and better integration with surgical imaging data. The open-source Python code, we developed, invites collaboration and innovation, reflecting our commitment to advancing computational pathology through technology and shared resources. Paper to be submitted to PLOS Computational Biology.

3

Cardiac Magnetic Resonance Imaging in the German National Cohort: Automated Segmentation of Short-Axis Cine Images and Post-Processing Quality Control

Full, P. M.; Schirrmeister, R. T.; Hein, M.; Russe, M. F.; Reisert, M.; Ammann, C.; Greiser, K. H.; Niendorf, T.; Pischon, T.; Schulz-Menger, J.; Maier-Hein, K. H.; Bamberg, F.; Rospleszcz, S.; Schlett, C. L.; Schuppert, C.

2025-05-21 radiology and imaging 10.1101/2025.05.20.25328013 medRxiv

Top 0.1%

6.7%

Show abstract

PurposeTo develop a segmentation and quality control pipeline for short-axis cardiac magnetic resonance (CMR) cine images from the prospective, multi-center German National Cohort (NAKO). Materials and MethodsA deep learning model for semantic segmentation, based on the nnU-Net architecture, was applied to full-cycle short-axis cine images from 29,908 baseline participants. The primary objective was to determine data on structure and function for both ventricles (LV, RV), including end diastolic volumes (EDV), end systolic volumes (ESV), and LV myocardial mass. Quality control measures included a visual assessment of outliers in morphofunctional parameters, inter- and intra-ventricular phase differences, and LV time-volume curves (TVC). These were adjudicated using a five-point rating scale, ranging from five (excellent) to one (non-diagnostic), with ratings of three or lower subject to exclusion. The predictive value of outlier criteria for inclusion and exclusion was analyzed using receiver operating characteristics. ResultsThe segmentation model generated complete data for 29,609 participants (incomplete in 1.0%) and 5,082 cases (17.0 %) were visually assessed. Quality assurance yielded a sample of 26,899 participants with excellent or good quality (89.9%; exclusion of 1,875 participants due to image quality issues and 835 cases due to segmentation quality issues). TVC was the strongest single discriminator between included and excluded participants (AUC: 0.684). Of the two-category combinations, the pairing of TVC and phases provided the greatest improvement over TVC alone (AUC difference: 0.044; p<0.001). The best performance was observed when all three categories were combined (AUC: 0.748). Extending the quality-controlled sample to include acceptable quality ratings, a total of 28,413 (95.0%) participants were available. ConclusionThe implemented pipeline facilitated the automated segmentation of an extensive CMR dataset, integrating quality control measures. This methodology ensures that ensuing quantitative analyses are conducted with a diminished risk of bias.

4

Comparing 3D, 2.5D, and 2D Approaches to Brain Image Segmentation

Avesta, A. E.; Hossain, S.; Lin, M.; Aboian, M.; Krumholz, H.; Aneja, S.

2022-11-21 radiology and imaging 10.1101/2022.11.03.22281923 medRxiv

Top 0.1%

6.6%

Show abstract

Deep-learning methods for auto-segmenting brain images either segment one slice of the image (2D), five consecutive slices of the image (2.5D), or an entire volume of the image (3D). Whether one approach is superior for auto-segmenting brain images is not known. We compared these three approaches (3D, 2.5D, and 2D) across three auto-segmentation models (capsule networks, UNets, and nnUNets) to segment brain structures. We used 3430 brain MRIs, acquired in a multi-institutional study, to train and test our models. We used the following performance metrics: segmentation accuracy, performance with limited training data, required computational memory, and computational speed during training and deployment. 3D, 2.5D, and 2D approaches respectively gave the highest to lowest Dice scores across all models. 3D models maintained higher Dice scores when the training set size was decreased from 3199 MRIs down to 60 MRIs. 3D models converged 20% to 40% faster during training and were 30% to 50% faster during deployment. However, 3D models require 20 times more computational memory compared to 2.5D or 2D models. This study showed that 3D models are more accurate, maintain better performance with limited training data, and are faster to train and deploy. However, 3D models require more computational memory compared to 2.5D or 2D models.

5

Self-supervised Learning for Chest CT - Training Strategies and Effect on Downstream Applications

Tariq, A.; Patel, B.; Banerjee, I.

2024-02-05 radiology and imaging 10.1101/2024.02.01.24302144 medRxiv

Top 0.1%

6.5%

Show abstract

Self-supervised pretraining can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the medical imaging data. In this study, we investigate several self-supervised training strategies for chest computed tomography exams and their effects of downstream applications. we bench-mark five well-known self-supervision strategies (masked image region prediction, next slice prediction, rotation prediction, flip prediction and denoising) on 15M chest CT slices collected from four sites of Mayo Clinic enterprise. These models were evaluated for two downstream tasks on public datasets; pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Image embeddings generated by these models were also evaluated for prediction of patient age, race, and gender to study inherent biases in models understanding of chest CT exams. Use of pretraining weights, especially masked regions prediction based weights, improved performance and reduced computational effort needed for downstream tasks compared to task-specific state-of-the-art (SOTA) models. Performance improvement for PE detection was observed for training dataset sizes as large as [Formula] with maximum gain of 5% over SOTA. Segmentation model initialized with pretraining weights learned twice as fast as randomly initialized model. While gender and age predictors built using self-supervised training weights showed no performance improvement over randomly initialized predictors, the race predictor experienced a 10% performance boost when using self-supervised training weights. We released models and weights under open-source academic license. These models can then be finetuned with limited task-specific annotated data for a variety of downstream imaging tasks thus accelerating research in biomedical imaging informatics.

6

Simplification of free-running cardiac magnetic resonance by respiratory phase using principal component analysis

Shammi, U. A.; Luan, Z.; Xu, J.; Hamid, A.; Flors, L.; Cassani, J.; Altes, T. A.; Thomen, R. P.; Van Doren, S. R.

2022-08-30 radiology and imaging 10.1101/2022.08.29.22279299 medRxiv

Top 0.1%

6.3%

Show abstract

Cardiac magnetic resonance imaging (CMR) provides many cardiac functional insights. The reliance of standard cine CMR upon breath holds is not feasible for some patients. Its process of combining multiple heartbeats is unsuited to arrhythmias. Real-time cine methods sidestep these problems but can introduce respiratory displacement of the heart. To aid CMR acquisitions during breathing, we developed post-processing software to diminish the effects of respiratory displacement of the heart. It uses principal component analysis to resolve respiratory motions from cardiac cycles in the dynamic image. The software groups heartbeats from expiration and inspiration to decrease the appearance of respiratory motion. The effects of respiratory motion and such motion correction were evaluated on short-axis views (acquired with compressed sensing) of 11 healthy subjects and 8 cardiac patients. The smallest correlation coefficients between end-systolic frames of the original dynamic scans averaged 0.79. After segregation of cardiac cycles by respiratory phase, the mean correlation coefficients between cardiac cycles were 0.94 {+/-} 0.03 at end-expiration and 0.90 {+/-} 0.08 at end-inspiration. The improvements in correlation coefficients were significant in paired t-tests, i.e., P [≤] 0.01 for healthy subjects and P [≤] 0.001 for heart patients at end-expiration. Two expert cardiothoracic radiologists, blinded to the processing, assessed the dynamic images in terms of blood-myocardial contrast, endocardial interface definition, and motion artifacts. Clinical assessment preferred cardiac cycles during end-expiration, which maintained or enhanced scores in 90% of healthy subjects and 83% of the heart patients. Performance remained high in a case of arrhythmia and irregular breathing. Heartbeats collected from end-expiration reliably mitigated respiratory motion when the new software was applied to DICOM files from real-time acquisitions.

7

Learning to leverage salient regions in neuro-oncology using Deap Learning

Grigis, A.; Alentorn, A.; Frouin, V.

2020-10-23 cancer biology 10.1101/2020.10.22.350421 medRxiv

Top 0.1%

6.3%

Show abstract

Developing an automatic tumor detector for MRI medical images is a major challenge in neuro-oncology. The availability of such a tool would be a valuable assistance for the radiologists. Numerous works have tried to segment the tumor tissues, others have attempted to localize the tumor globally. In this work we focus on this second class of methods and we compare two drastically different strategies. The first one is an assumption-free anomaly detector build over a Variational Auto-Encoder (VAE), and the second one is a VGG classifier that embed Attention-Gated (AG) units to focus on the target structures at almost no additional computational cost. This comparison is first conducted on the publicly available BraTS glioma dataset for which published performance results can serve as reference, and extended as such (ie., without transfer learning) to two internal image datasets, namely Primary Central Nervous System Lymphoma (PCNSL) and Metastasis. The results demonstrate that the VAE and AG-VGG strategies can be used, up to a certain extent, to localize brain tumors.

8

UCSF RMaC: University of California San Francisco 3D Multi-Phase Renal Mass CT Dataset with Tumor Segmentations

Sahin, S.; Diaz, E.; Rajagopal, A.; Abtahi, M.; Jones, S.; Dai, Q.; Kramer, S.; Wang, Z.; Larson, P. E. Z.

2026-02-12 radiology and imaging 10.64898/2026.02.11.26346096 medRxiv

Top 0.1%

6.3%

Show abstract

Current standard of care imaging practices cannot reliably differentiate among certain renal tumors such as benign oncocytoma and clear cell renal cell carcinoma (RCC), and between low and high grade RCCs. Previous work has explored using deep learning, radiomics, and texture analysis to predict renal tumor subtypes and differentiate between low and high grade RCCs with mixed success. To further this work, large diverse datasets are needed to improve model performance and provide strong evaluation sets. In this work, a dataset of 831 multi-phase 3D CT exams was curated. Each exam contains up to three contrast-enhanced CT phases. Tumor outlines or bounding boxes were annotated and registered to the image volumes. The pathology results for each tumor and relevant patient metadata are also included.

9

Retrieval-Augmented Claude Opus 4.7 and GPT-5.5 Surpass Human Performance on the Nuclear Cardiology Board Preparation Exam (and Claude Drafts a Paper About it)

Killekar, A.; Shanbhag, A.; Miller, R. J.; Dey, D.; Bourque, J.; Phillips, L.; Chareonthaitawee, P.; Slomka, P.

2026-05-13 radiology and imaging 10.64898/2026.05.08.26352768 medRxiv

Top 0.1%

5.1%

Show abstract

BackgroundPrevious studies evaluated large language model (LLM) performance on the American Society of Nuclear Cardiology (ASNC) Board Preparation Exam. Without domain-specific context, the best model (GPT-4o) achieved 63.1%, below the estimated 65% passing threshold and the 78% mean score of human fellows-in-training (FITs). Providing textbook context improved GPT-4o to 73.8% on text-only questions, but still fell short of human trainees. Whether next-generation LLMs with retrieval-augmented generation (RAG) can exceed this gap is unknown. MethodsClaude Opus 4.7 and GPT-5.5 were administered all 168 questions (141 text-only, 27 image-based) from the 2023 ASNC Board Preparation Exam across 5 iterations each, using RAG with a nuclear cardiology textbook, companion atlas, and ASNC clinical guidelines. Claude used local FAISS-based semantic retrieval; GPT-5.5 used Azures cloud-hosted vector store. Performance was compared to prior LLM results and 13 human FITs. ResultsAcross 5 iterations, Claude Opus 4.7 achieved a mean accuracy of 86.3% {+/-} 1.4% (text 88.8%, image 73.3%). GPT-5.5 achieved 86.7% {+/-} 2.2% (text 88.5%, image 77.0%) but refused a mean of 12.2 questions (7.3%) per iteration due to safety filters. Both models surpassed the human FIT mean (78.0%) and the estimated passing threshold. Compared to GPT-4o without context (63.1%), this represents a 23-percentage-point improvement in 18 months. ConclusionNext-generation LLMs with RAG now surpass average human trainee performance on nuclear cardiology board preparation questions, suggesting significant potential as educational tools and knowledge-reference aids in cardiovascular imaging. Condensed AbstractAcross 5 iterations each, Claude Opus 4.7 and GPT-5.5 with retrieval-augmented generation achieved mean accuracies of 86.3% and 86.7% on the 2023 ASNC Board Preparation Exam (168 questions), both surpassing the mean human fellow-in-training score of 78%. GPT-5.5 refused a mean of 12.2 questions (7.3%) per iteration due to safety filters. These results represent a 23-percentage-point improvement over the best prior LLM without context (63.1%), demonstrating that RAG-enhanced LLMs have reached human-level proficiency in nuclear cardiology knowledge. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/26352768v2_ufig1.gif" ALT="Figure 1"> View larger version (49K): org.highwire.dtl.DTLVardef@5f2465org.highwire.dtl.DTLVardef@4e80d3org.highwire.dtl.DTLVardef@1ebbb93org.highwire.dtl.DTLVardef@167d3c1_HPS_FORMAT_FIGEXP M_FIG C_FIG Overview of the three-study research arc evaluating LLM performance on the 2023 ASNC Board Preparation Exam. Study 1 (2024) tested four LLMs without context (best: GPT-4o, 63.1%). Study 2 (2025) added textbook context to GPT-4o (73.8%). Study 3 (2026, current) evaluated Claude Opus 4.7 and GPT-5.5 with retrieval-augmented generation across 5 iterations each (mean 86.3% and 86.7%, respectively), both surpassing the human fellow-in-training mean of 78%. Right panel shows the performance scale with key thresholds.

10

Analysis Of Augmentation Techniques for Spine X-Ray Images

Sivakumar, E.; Anand, A.

2026-04-17 radiology and imaging 10.64898/2026.04.15.26350121 medRxiv

Top 0.1%

5.0%

Show abstract

Computer vision and deep learning techniques, including convolutional neural networks (CNNs) and transformers, have increased the performance of medical image classification systems. However, training deep learning models using medical images is a challenging task that necessitates a substantial amount of annotated data. In this paper, we implement data augmentation strategies to tackle dataset imbalance in the VinDr-SpineXR dataset, which has a lower number of spine abnormality X-ray images compared to normal spine X-ray images. Geometric transformations and synthetic image generation using Generative Adversarial Networks are explored and applied to the abnormal classes of the dataset, and classifier performance is validated using VGG-16 and InceptionNet to identify the most effective augmentation technique. Additionally, we introduce a hybrid augmentation technique that addresses class imbalance, reduces computational overhead relative to a GAN-only approach, and achieves [~]99% validation accuracy with both classifiers across all three case studies.

11

It is All in the Details: Guiding Fine-Feature Characteristics in Artificial Medical Images using Diffusion Models

Hofmeijer, E. I. S.; Zu, D.; van der Heijden, F.; Tan, C. O.

2025-05-02 radiology and imaging 10.1101/2025.05.01.25326784 medRxiv

Top 0.1%

4.9%

Show abstract

We sought to develop a diffusion model-based framework that guides both larger anatomical structures and fine features to generate radiographic images that accurately reflect pathological characteristics. The model is based on a latent diffusion model that is extended to include coarse- and fine-feature guidance. The feedback of an independent classifier network, trained to identify malignant features, was used to provide the fine-feature guidance. We compared the accuracy of this model to that attained by one without fine-feature guidance and by a standard generative adversarial network. We used the area under ROC to compare accuracy across the networks in representing malignant features of lung nodules and gliomas on 44,924 lung CT and 6,376 MRI 2D images (annotated by trained radiologists). Statistical significance was assessed using bootstrapped p-values. For each dataset, the model generated artificial images comparable to original ones. Benign vs malignant classification accuracy without fine-feature guidance was 70% (CT), 81% (MRI). Fine-feature guidance increased the accuracy to 85.5%, 86%, for, respectively, CT and MRI images (vs unguided, p < 0.001, p < 0.001). It is feasible to use independent classifier guidance to create artificial radiographic images that accurately reflect fine features across pathologies and imaging modalities.

12

Deep-Learning-Enabled Differentiation between Intraprostatic Gold Fiducial Markers and Calcification in Quantitative Susceptibility Mapping

Stewart, A. W.; Goodwin, J.; Richardson, M.; Robinson, S. D.; O'Brien, K.; Jin, J.; Barth, M.; Bollmann, S.

2023-10-31 pathology 10.1101/2023.10.26.564293 medRxiv

Top 0.1%

4.9%

Show abstract

PurposeInterest is growing in MR-only radiotherapy (RT) planning for prostate cancer (PCa) due to the potential reductions in cost and patient exposure to radiation, and a more streamlined work-flow and patient imaging pathway. However, in MRI, the gold fiducial markers (FMs) used for target localization appear as signal voids, complicating differentiation from other void sources such as calcifications and bleeds. This work investigates using Quantitative Susceptibility Mapping (QSM), an MRI phase post-processing technique, to aid in the differentiation task. It also presents deep learning models that capture nuanced information and automate the segmentation task, facilitating a streamlined approach to MR-only RT. MethodsCT and MRI, including GRE and T1-weighted imaging, were acquired from 26 PCa patients, each with three implanted gold FMs. GRE data were post-processed into QSM, T 2*, and R2* maps using QSMxTs body imaging pipeline. Statistical analyses were conducted to investigate the quantitative differentiation of FMs and calcification in each contrast. 3D U-Nets were developed using fastMONAI to automate the segmentation task using various combinations of MR-derived contrasts, with a model trained on CT used as a baseline. Models were evaluated using precision and recall calculated using a leave-one-out cross-validation scheme. ResultsSignificant differences were observed between FM and calcification regions in CT, QSM and T 2*, though overlap was observed in QSM and T 2*. The baseline CT U-Net achieved an FM-level precision of {approx} 98% and perfect recall. The best-performing QSM-based model achieved precision and recall of 80% and 90%, respectively, while conventional MRI had values below 70% and 80%, respectively. The QSM-based model produced segmentations with good agreement with the ground truth, including a challenging FM that coincided with a bleed. ConclusionThe model performance highlights the value of using QSM over indirect measures in MRI, such as signal voids in magnitude-based contrasts. The results also indicate that a U-Net can capture more information about the presentation of FMs and other sources than would be possible using susceptibility quantification alone, which may be less reliable due to the diverse presentation of sources across a patient population. In our study, QSM was a reliable discriminator of FMs and other sources in the prostate, facilitating an accurate and streamlined approach to MR-only RT.

13

Performance of Naiive Spectral Geometric Models in Histopathology AI

Leyva, A.; Niazi, M. K. K.

2026-02-02 pathology 10.64898/2026.01.30.702908 medRxiv

Top 0.1%

4.9%

Show abstract

There have been no systematic evaluations of purely spectral models for digital pathology tasks. We implemented and benchmarked four pipelines: binary classification on the BreaKHis dataset, multi-class region classification in glioblastoma, spatial transcriptomics, and denoising on Visium 10x. Across all tasks, extensive cross-validation and grouped splits showed that purely spectral models did not improve performance over CNN-only baselines, but offer useful complementary tools for interpretability and processing. Denoising showed strong performance that proves utility in data-scarce or heterogeneous image environments. Equivalence testing confirms that spectral and CNN model performances fall outside {+/-}3% AUC. Fusion models between CNNs and spectral models show higher balanced accuracy. Spectral models failed to generalize across spatial transcriptomics tasks, with low correlation despite stable training loss. These findings represent a systematic negative result: despite their theoretical richness, spectral geometric features and SNO embeddings prove to be complementary features for WSI classification or segmentation. Reporting such outcomes is essential to establish empirical boundaries for spectral methods and to encourage future work on conditions or data modalities where these approaches may hold greater promise.

14

Beyond Algorithms: The Impact of Simplified CNN Models and Multifactorial Influences on Radiological Image Analysis

Mohammadi, S.; Mohanty, A. S.; Saikali, S.; Rose, D.; LynnHtaik, W.; Greaves, R.; Lounes, T.; Haque, E.; Hirani, A.; Zahiri, J.; Dehzangi, I.; Patel, V.; Khosravi, P.

2024-09-16 radiology and imaging 10.1101/2024.09.15.24313585 medRxiv

Top 0.1%

4.8%

Show abstract

This paper demonstrates that simplified Convolutional Neural Network (CNN) models can outperform traditional complex architectures, such as VGG-16, in the analysis of radiological images, particularly in datasets with fewer samples. We introduce two adopted CNN architectures, LightCnnRad and DepthNet, designed to optimize computational efficiency while maintaining high performance. These models were applied to nine radiological image datasets, both public and in-house, including MRI, CT, X-ray, and Ultrasound, to evaluate their robustness and generalizability. Our results show that these models achieve competitive accuracy with lower computational costs and resource requirements. This finding underscores the potential of streamlined models in clinical settings, offering an effective and efficient alternative for radiological image analysis. The implications for medical diagnostics are significant, suggesting that simpler, more efficient algorithms can deliver better performance, challenging the prevailing reliance on transfer learning and complex models. The complete codebase and detailed architecture of the LightCnnRad and DepthNet, along with step-by-step instructions, are accessible in our GitHub repository at https://github.com/PKhosravi-CityTech/LightCNNRad-DepthNet.

15

Transfer Learning for Medical Imaging: An Empirical Evaluation of CNN Architectures on Chest Radiographs

Salve, H. S.

2026-01-08 radiology and imaging 10.64898/2026.01.07.26343591 medRxiv

Top 0.1%

4.8%

Show abstract

This paper presents a comprehensive comparative study of five state-of-the-art CNN architectures, VGG19, ResNet50, InceptionV3, DenseNet121, and EfficientNetB0 for multi-class classification of Chest X-ray images (CXR) into four categories: Edema, Normal, Pneumonia, and Tuberculosis (TB). The models were trained, validated, and tested on a dataset comprising 6,092 training and 325 testing images across four distinct classes. Each architecture was initialized with ImageNet weights, augmented with a custom classifier, and fine-tuned under identical conditions to ensure a fair comparison. The models are evaluated on a comprehensive set of metrics, including accuracy, per-class recall, training time, and model complexity. Experimental results indicate that VGG19 achieved the highest classification accuracy of 98.15%, followed closely by ResNet50 at 97.54%. This study provides empirical evidence to guide the selection of appropriate deep learning models for chest X-ray diagnosis, balancing performance with operational constraints

16

Cardiovascular magnetic resonance reference ranges for cardiac function and structure and recommendations for grading severity: the Healthy Hearts Consortium

Szabo, L. E.; McCracken, C.; Condurache, D. G.; Bulow, R.; Aquaro, G. D.; Andre, F.; Thu-Thao, L.; Sucha, D.; Salih, A. M.; Roy, R.; Salatzki, J.; Aung, N.; Chadalavada, S.; Lee, A. M.; Harvey, N. C.; Leiner, T.; Chin, C. W. L.; Friedrich, M. G.; Barison, A.; Dorr, M.; Raisi-Estabragh, Z.; Petersen, S. E.

2025-04-08 radiology and imaging 10.1101/2025.04.07.25325351 medRxiv

Top 0.1%

4.8%

Show abstract

IntroductionCardiovascular magnetic resonance (CMR) imaging offers precise quantification of cardiac structure and function. However, its clinical utility is often limited by the absence of robust, standardized reference ranges and severity grading thresholds. AimsThe aim of this study was to establish age-, sex-, and ethnicity-specific reference ranges and severity grading criteria for CMR-derived ventricular and atrial parameters in healthy adults, accounting for variations between two post-processing software tools. Methods and resultsWe analyzed CMR scans from the Healthy Hearts Consortium (HHC), which includes six multi-ethnic international cohorts. Images were automatically segmented using cvi42 (Circle Cardiovascular Imaging) and suiteHEART (Neosoft), with visual and statistical quality control. Ventricular and atrial volumes, myocardial mass, and ejection fractions were derived using short- and long-axis protocols; parameters were indexed to body surface area and height. We defined reference ranges as normal up to the 95% of the prediction interval (PI), and abnormalities as mild up to 99.73%, moderate at 99.73%, and severe at 99.99%, respectively. The final dataset included 4,624 women (51.0%) and 4,435 men (49.0%), with a mean age of 61 {+/-} 13 years (range 18-83), and a multi-ethnic population (81.6% White, 5.6% South Asian, 5.3% Mixed/Other, 3.8% Black, 3.7% Chinese). Minor systematic differences were observed between cvi42 and suiteHEART, particularly in atrial parameters. ConclusionsOur work provides an evidence-based framework for CMR severity grading, offering age-, sex-, and ethnicity-stratified thresholds for mild, moderate, and severe deviations from the reference. These reference values support improved diagnostic accuracy, better risk stratification, and enhanced comparability of CMR findings worldwide. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=120 SRC="FIGDIR/small/25325351v1_ufig1.gif" ALT="Figure 1"> View larger version (38K): org.highwire.dtl.DTLVardef@1fbadc2org.highwire.dtl.DTLVardef@4ccfdcorg.highwire.dtl.DTLVardef@e2ad16org.highwire.dtl.DTLVardef@1d1bb4d_HPS_FORMAT_FIGEXP M_FIG C_FIG Footnote: This graphical abstract summarises the methodology and findings of our study on severity grading using cardiovascular magnetic resonance (CMR). It illustrates the dataset, quality control steps, software tools used and the derivation of population-specific reference ranges and severity grading classification. All reference ranges are available on the Healthy Hearts Consortium website (www.healthy-hearts.org.uk). Abbreviations: CMR: cardiovascular magnetic resonance; QC: quality control; EDV: end-diastolic volume; ESV: end-systolic volume; SV: stroke volume; EF: ejection fraction; LV: left ventricle.

17

Making Densenet Interpretable A Case Study In Clinical Radiology

Ngan, K. H.; Garcez, A. d.; Knapp, K. M.; Appelboam, A.; Reyes-Aldasoro, C. C.

2019-12-05 radiology and imaging 10.1101/19013730 medRxiv

Top 0.1%

4.8%

Show abstract

The monotonous routine of medical image analysis under tight time constraints has always led to work fatigue for many medical practitioners. Medical image interpretation can be error-prone and this can increase the risk of an incorrect procedure being recommended. While the advancement of complex deep learning models has achieved performance beyond human capability in some computer vision tasks, widespread adoption in the medical field has been held back, among other factors, by poor model interpretability and a lack of high-quality labelled data. This paper introduces a model interpretation and visualisation framework for the analysis of the feature extraction process of a deep convolutional neural network and applies it to abnormality detection using the musculoskeletal radiograph dataset (MURA, Stanford). The proposed framework provides a mechanism for interpreting DenseNet deep learning architectures. It aims to provide a deeper insight about the paths of feature generation and reasoning within a DenseNet architecture. When evaluated on MURA at abnormality detection tasks, the model interpretation framework has been shown capable of identifying limitations in the reasoning of a DenseNet architecture applied to radiography, which can in turn be ameliorated through model interpretation and visualization.

18

Structural Knowledge Transfer of Panoptic Kidney Segmentation to Other Stains, Organs, and Species

Ginley, B. G.; Jen, K.-Y.; Sarder, P.

2021-10-23 pathology 10.1101/2021.10.21.465370 medRxiv

Top 0.1%

4.5%

Show abstract

BackgroundPanoptic segmentation networks are a newer class of image segmentation algorithms that are constrained to understand the difference between instance-type objects (objects that are discrete countable entities, such as renal tubules) and group-type objects (uncountable, amorphous regions of texture such as renal interstitium). This class of deep networks has unique advantages for biological datasets, particularly in computational pathology. MethodsWe collected 126 periodic acid Schiff whole slide images of native diabetic nephropathy, lupus nephritis, and transplant surveillance kidney biopsies, and fully annotated them for the following micro-compartments: interstitium, glomeruli, globally sclerotic glomeruli, tubules, and arterial tree (arteries/arterioles). Using this data, we trained a panoptic feature pyramid network. We compared performance of the network against a renal pathologists annotations, and the methods transferability to other computational pathology domain tasks was investigated. ResultsThe panoptic feature pyramid networks showed high performance as compared to renal pathologist for all of the annotated classes in a testing set of transplant kidney biopsies. The network was not only able to generalize its object understanding across different stains and species of kidney data, but also across several organ types. ConclusionsPanoptic networks have unique advantages for computational pathology; namely, these networks internally model structural morphology, which aids bootstrapping of annotations for new computational pathology tasks.

19

PathFlow-MixMatch for Whole Slide Image Registration: An Investigation of a Segment-Based Scalable Image Registration Method

Levy, J. J.; Jackson, C. R.; Haudenschild, C. C.; Christensen, B. C.; Vaickus, L. J.

2020-03-24 pathology 10.1101/2020.03.22.002402 medRxiv

Top 0.1%

4.5%

Show abstract

Image registration involves finding the best alignment between different images of the same object. In these tasks, the object in question is viewed differently in each of the images (e.g. different rotation or light conditions, etc.). In digital pathology, image registration aligns correspondent regions of tissue from different stereotactic viewpoints (e.g. subsequent deeper sections of the same tissue). These comparisons are important for histological analysis and can facilitate previously unavailable manipulations, such as 3D tissue reconstruction and cell-level alignment of immunohistochemical (IHC) and special stains. Several benchmarks have been established for evaluating image registration techniques for histological tissue; however, little work has evaluated the impact of scaling registration techniques to Giga-Pixel Whole Slide Images (WSI), which are large enough for significant memory limitations, and contain recurrent patterns and deformations that hinder traditional alignment algorithms. Furthermore, as tissue sections often contain multiple, discrete, smaller tissue fragments, it is unnecessary to align an entire image when the bulk of the image is background whitespace and tissue fragments orientations are often agnostic of each other. We present a methodology for circumventing large-scale image registration issues in histopathology and accompanying software. By removing background pixels, parsing the slide into discrete tissue segments, and matching, orienting and registering smaller segment pairs, we recovered registrations with lower Target Registration Error (TRE) when compared to utilizing the unmanipulated WSI. We tested our technique by having a pathologist annotate landmarks from 13 pairs of differently stained liver biopsy slides, performing WSI and segment-based registration techniques, and comparing overall TRE. Preliminary results demonstrate superior performance of registering segment pairs versus registering WSI (difference of median TRE of 44 pixels, p<0.001). Segment matching within WSI is an effective solution for histology image registration but requires further testing and validation to ensure its viability for stain translation and 3D histology analysis.

20

A Case Study of Transfer of Lesion-Knowledge

Krishnan, S.; Khincha, R.; Vig, L.; Dash, T.; Srinivasan, A.

2020-08-22 radiology and imaging 10.1101/2020.08.19.20178210 medRxiv

Top 0.1%

4.4%

Show abstract

All organs in the human body are susceptible to cancer, and we now have a growing store of images of lesions in different parts of the body. This, along with the acknowledged ability of neural-network methods to analyse image data, would suggest that accurate models for lesions can now be constructed by a deep neural network. However an important difficulty arises from the lack of annotated images from various parts of the body. Our proposed approach to address the issue of scarce training data for a target organ is to apply a form of transfer learning: that is, to adapt a model constructed for one organ to another for which there are minimal or no annotations. After consultation with medical specialists, we note that there are several discriminating visual features between malignant and benign lesions that occur consistently across organs. Therefore, in principle, these features boost the case for transfer learning on lesion images across organs. However, this has never been previously investigated. In this paper, we investigate whether lesion knowledge can be transferred across organs. Specifically, as a case study, we examine the transfer of a lesion model from the brain to lungs and lungs to the brain. We evaluate the efficacy of transfer of a brain-lesion model to the lung, and the transfer of a lung-lesion model to the brain by comparing against a model constructed: (a) without model-transfer (i.e.random weights); and (b) using model-transfer from a lesion-agnostic dataset (ImageNet). In all cases, our lesion models perform substantially better. These results point to the potential utility of transferring lesionknowledge across organs other than those considered here.